-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add c14n for node and document #138
base: master
Are you sure you want to change the base?
Conversation
@saks interesting, there is a very high-level question I have first. Why do you think this wrapper crate is the right place to introduce this method? As you mention xml_c14n exists as a separate Rust crate. Maybe your work could also be packaged as a separate crate? |
c14n is a part of the standard libxml2 featureset, so it would be nice to have this functionality. This MR does a bit more, but from my reading of the readme it should be covered:
I'd like to have this for some validation in tests where I compare the output of some rust code against a canonicalized result of an xpath query. |
Having independent intrerest is certainly helpful, as is pointing out that c14n can be fully thought of as exposing libxml2 functionality. So I am feeling a lot more inclined to merge the PR - I will make some time to review it in more depth soon. Would you like to also add a review for this PR @tstenner ? Given you are an active user of the feature, you may have good visibility of any edges left to iron out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments regarding the c14n function parameters below. Other than that the API looks easy to use.
I didn't find anything wrong with the implementation. The output buffer might be shorter, but I'm not as knowledgable in this area.
The test coverage mirrors the "official" test cases and is quite extensive.
} | ||
|
||
/// Canonicalization specification to use | ||
#[derive(Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash, Debug, Default)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had implemented this as follows:
/// Canonicalization mode for [`Document.c14n`]
pub enum C14NMode {
/// [XML_C14N_1_0](https://www.w3.org/TR/2001/REC-xml-c14n-20010315)
Mode1_0,
/// [XML_C14N_1_1](https://www.w3.org/TR/xml-c14n11/)
Mode1_1,
/// [XML_C14N_EXCLUSIVE_1_0](https://www.w3.org/TR/xml-exc-c14n/)
ModeExclusive1_0(Vec<CString>),
}
impl C14NMode {
fn as_c_int(&self) -> c_int {
match self {
C14NMode::Mode1_0 => xmlC14NMode_XML_C14N_1_0 as c_int,
C14NMode::Mode1_1 => xmlC14NMode_XML_C14N_1_1 as c_int,
C14NMode::ModeExclusive1_0(_) => xmlC14NMode_XML_C14N_EXCLUSIVE_1_0 as c_int,
}
}
}
It doesn't map the C API 1:1, but makes it clear the inclusive_ns_prefixes
only apply to the exclusive c14n mode
} | ||
} | ||
|
||
unsafe fn c_obuf_into_output(c_obuf: xmlOutputBufferPtr) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first approach looked like this:
let buffer = xmlBufferCreate();
let c14n_res = xmlC14NExecute(
…,
xmlOutputBufferCreateBuffer(buffer, ptr::null_mut()),
);
let result = xmlBufferContent(buffer);
let c_string = CStr::from_ptr(result as *const c_char);
let node_string = c_string.to_string_lossy().
xmlBufferFree(
Ok(node_string)
It's shorter and fewer additional functions, but I'm probably missing some edge cases / needed error handling.
} | ||
|
||
/// Create a [Vec] of null-terminated [*mut xmlChar] strings | ||
fn to_xml_string_vec(vec: Vec<String>) -> Vec<*mut xmlChar> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the audience it might be feasible to require the inclusive namespaces to be a Vec<CString>
. After all, the namespaces to keep will most likely be hardcoded anyways.
|
||
/// Options for configuring how to canonicalize XML | ||
#[derive(Clone, Eq, PartialEq, Ord, PartialOrd, Hash, Debug, Default)] | ||
pub struct CanonicalizationOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With inclusive_ns_prefixes
as part of the CanonicalizationMode
(see below), there are only two options left. For those, I'd rather have them as arguments to the functions instead of having an options struct.
pub fn canonicalize( | ||
&self, | ||
options: CanonicalizationOptions, | ||
callback: Option<(xmlNodePtr, xmlC14NIsVisibleCallback)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had implemented the callback functionality as follows:
type IsVisibleCallback = Box<dyn Fn(&RoNode, &RoNode) -> bool>;
// thin pointer wrapper that calls supplied the callback instead
unsafe extern "C" fn _is_visible_wrapper(
data: *mut c_void,
node: xmlNodePtr,
parent: xmlNodePtr,
) -> c_int {
let callback = unsafe { &mut *(data as *mut IsVisibleCallback) };
// handling of parent nodes etc.
// …
(callback)(&RoNode(node), &RoNode(parent))) as c_int
}
impl Document{
/// Canonicalizes the XML document according to the W3C XML Canonicalization specification.
///
/// This method produces a canonical form of the XML document, which is useful for digital signatures
/// and document comparison. The canonicalization process ensures consistent representation of the XML content.
pub fn c14n(
&self,
mode: C14NMode,
with_comments: bool,
) -> Result<String, ()> {
self.c14n_with_visibility_callback(None, mode, with_comments)
}
/// Canonicalizes the document with an optional visibility callback
///
/// `is_visible_callback(node: &RoNode, parent: &RoNode)` is called for every
/// node having a parent, returning true if the node should be included in the
/// canonicalized output.
pub fn c14n_with_visibility_callback(
&self,
is_visible_callback: Option<IsVisibleCallback>,
mode: C14NMode,
with_comments: bool,
) -> Result<String, ()> {
// boxes the callback so it can be passed as a void pointer to [`_is_visible_wrapper`]
let (is_visible_fn, mut user_data) = match is_visible_callback {
Some(f) => (
Some(_is_visible_wrapper as unsafe extern "C" fn(_, _, _) -> _),
Some(Box::into_raw(f)),
),
None => (None, None),
};
let c14n_res = xmlC14NExecute(
self.doc_ptr(),
is_visible_fn,
user_data
.as_mut()
.map(|s| ptr::from_mut(s))
.unwrap_or(ptr::null_mut()) as *mut c_void,
mode.as_c_int(),
inclusive_ns_prefixes,
with_comments as c_int,
xmlOutputBufferCreateBuffer(buffer, ptr::null_mut()),
);
// …
}
}
Usage looks like this:
let input = r#"<ns1:root><ns2:foo x="1" a="2"/><!--cmt--><a/><b/></ns1:root>"#;
let callback = |_node: &RoNode, _parent: &RoNode| {
!(_parent.get_name() == "ns1:root" && _node.get_name() == "a")
};
let c14n_result = doc.c14n_with_visibility_callback(
Some(Box::new(callback)),
libxml::tree::document::C14NMode::Mode1_1,
false,
);
It's a lot more flexible, (I assume) a lot more complex once the ancestor nodes are handled probably as in your PR
}; | ||
|
||
impl Document { | ||
/// Canonicalize a document and return the results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring could use a more information or a link to the libxml2 documentation (which isn't that great either). The callback parameter is obvious and could benefit from an examples
@dginev Thanks for reconsidering it. Overall, I only had one obvious improvement to suggest and an opinion regarding the callback parameter (but then again, I don't use it, so it doesn't matter to me that much). The output buffer handling could be shorter, but that is something you know more about than me. |
There are not too many choices when it comes to "Canonical XML" implementation for rust. This PR is a compilation of ideas, approaches and test files from the following projects:
Work items
canonicalize
method forNode
andDocument
Node::ancestors
method (I personally find it very useful when using "nokogiri", plus it help in tests a lot). Although I don't feel too strongly about, can remove if it doesn't fit.Node::at_xpath
method. Yes, it is similar tofindnodes
(and can ultimately be implemented usingContext
), but with design from "nokogiri", I find it a lot more flexible when navigating a tree. Same as for the previous item, can update PR to remove it if necessary.Any feedback is welcome! I don't feel particularly attached to any approach here. Please let me know how to make this PR better.